Skip to content

Conversation

@germa89
Copy link
Collaborator

@germa89 germa89 commented Nov 18, 2025

Description

This PR adds a background monitoring thread to the _multi_connect method in MapdlGrpc that actively checks if the MAPDL process is alive during connection attempts. This enhancement allows PyMAPDL to exit early if the MAPDL process dies, rather than waiting for the full timeout period.

Key Changes

Modified _multi_connect method in src/ansys/mapdl/core/mapdl_grpc.py:

  • Added a monitoring thread that runs in parallel with connection attempts
  • Thread checks process status using _check_process_is_alive every 0.5 seconds
  • Early exit when process death is detected, reducing wait time from full timeout (~15-30s) to ~1-2 seconds
  • Proper thread cleanup using threading.Event and join() mechanisms
  • Monitoring only activates for local MAPDL instances with available process and path information

Test Coverage:

Added 7 comprehensive unit tests in tests/test_launcher.py:

  1. test_multi_connect_with_valid_process - Tests successful connection with alive process (requires local MAPDL)
  2. test_multi_connect_early_exit_on_process_death - Tests early exit when process dies during connection (requires local MAPDL)
  3. test_multi_connect_monitoring_conditions - Parametrized test for different monitoring start conditions
  4. test_multi_connect_monitoring_thread_cleanup - Verifies thread is properly cleaned up
  5. test_multi_connect_monitor_detects_process_death - Tests monitor detection capability with mocks
  6. test_multi_connect_with_successful_connection_stops_monitoring - Verifies monitor stops after successful connection
  7. test_multi_connect_remote_no_monitoring - Tests that monitoring doesn't start for remote instances

Benefits

  • Faster failure detection: Users no longer wait for full timeout when MAPDL fails to start
  • Better user experience: Immediate feedback when process dies during connection
  • Resource efficiency: No unnecessary waiting when process has already failed
  • Non-intrusive: Only monitors local instances, doesn't affect remote connections
  • Thread-safe: Proper cleanup ensures no resource leaks

Issue linked

This PR addresses the issue where PyMAPDL waits for the full timeout period even when the MAPDL process has already died during connection attempts, particularly when using launch and discovery methods.

Checklist

This commit adds a background monitoring thread to the _multi_connect method that checks if the MAPDL process is alive during connection attempts. This allows PyMAPDL to exit early if the process dies, rather than waiting for the full timeout period.

Key changes: Added monitoring thread in _multi_connect that runs in parallel with connection attempts. Thread checks process status every 0.5 seconds. Early exit when process death is detected. Proper thread cleanup with Event and join mechanisms. Only monitors local MAPDL instances.

Tests added: 7 new tests covering valid connections, early exit on death, monitoring conditions, thread cleanup, process death detection, successful connection cleanup, and remote instance behavior.
@germa89 germa89 requested a review from a team as a code owner November 18, 2025 10:24
@github-actions github-actions bot added the new feature Request or proposal for a new feature label Nov 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a background monitoring thread in the _multi_connect method that actively checks if a local MAPDL process is alive during connection attempts. This enables PyMAPDL to exit early if the MAPDL process dies, reducing wait time from a full timeout (~15-30s) to approximately 1-2 seconds. The monitoring thread runs in parallel with connection attempts, checking process status every 0.5 seconds using _check_process_is_alive, and only activates for local MAPDL instances with available process and path information.

Key Changes

  • Added monitoring thread to _multi_connect in src/ansys/mapdl/core/mapdl_grpc.py with proper cleanup using threading.Event and join() mechanisms
  • Implemented 7 comprehensive unit tests covering various scenarios including process death detection, monitoring conditions, thread cleanup, and remote instance behavior
  • Thread monitors process status every 0.5 seconds and triggers early exit when process death is detected

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/ansys/mapdl/core/mapdl_grpc.py Adds monitoring thread to _multi_connect method with proper cleanup and early exit logic when MAPDL process dies during connection
tests/test_launcher.py Adds 7 unit tests validating monitoring thread functionality, early exit behavior, thread cleanup, and proper activation conditions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Start the monitoring thread
monitor_thread = None
if self._local and self._mapdl_process:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monitoring thread starts when self._local and self._mapdl_process are truthy, but inside the thread function at line 634, it also checks for self._path. This creates an inconsistency where the thread starts but may not actually monitor anything if self._path is None. Consider adding and self._path to the condition at line 654 to match the monitoring condition inside the thread.

Suggested change
if self._local and self._mapdl_process:
if self._local and self._mapdl_process and self._path:

Copilot uses AI. Check for mistakes.
process.kill()
try:
process.wait(timeout=2)
except:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a bare except: clause catches all exceptions including SystemExit and KeyboardInterrupt, which can make debugging difficult. Use except Exception: instead to catch only standard exceptions.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
Comment on lines 2627 to 2632
# Don't raise exception - process is alive

Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 2626 doesn't fully explain the mock's purpose. Consider expanding it to clarify that this mock simulates an alive process by not raising an exception, allowing the test to verify that the monitoring stops after successful connection.

Suggested change
# Don't raise exception - process is alive
# This mock simulates an alive process by not raising an exception.
# This allows the test to verify that the monitoring thread stops
# after a successful connection, as no process death is detected.

Copilot uses AI. Check for mistakes.
time.sleep(1.0)

# Count should not increase much after connection success
# (maybe 1-2 more checks before thread stops)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 3 lacks explanation. Consider adding a comment explaining why up to 3 additional checks are acceptable after connection success (e.g., due to race conditions between connection success and thread termination).

Suggested change
# (maybe 1-2 more checks before thread stops)
# (maybe 1-2 more checks before thread stops)
# Allow up to 3 additional checks after connection success due to possible
# race conditions between connection success and monitoring thread termination.

Copilot uses AI. Check for mistakes.
- Add _path check to monitoring thread start condition (requires both process and path)
- Mock _check_process_is_alive in all tests to prevent Mock.poll() issues
- All 8 tests now pass successfully (2 skipped on non-local)
@germa89 germa89 force-pushed the feat/early-exiting-timeout branch from ddc43da to 0cfeba8 Compare November 20, 2025 14:04
@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 87.23404% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.79%. Comparing base (be80897) to head (3b6c0ad).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4310      +/-   ##
==========================================
- Coverage   91.28%   89.79%   -1.50%     
==========================================
  Files         193      193              
  Lines       15742    15775      +33     
==========================================
- Hits        14370    14165     -205     
- Misses       1372     1610     +238     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature Request or proposal for a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants